…1801)
The bundler's import path used to claim an agent_id, mutate pipelines/flows/agent_config across
multiple unwrapped wpdb writes, and only check sub-step results for the agent insert. Under SQLite
contention on Studio that produced "success: true" responses with a populated agent_id summary while
the underlying rows were silently rolled back, leaving operators with a `Agent "..." not found` on
the very next CLI call.
This change wraps the post-claim work in a transaction with try/catch + manual rollback, throws on
any sub-step write failure, re-fetches the agent row after the final update_agent and verifies the
artifact registry actually persisted, and surfaces typed errors (`install_post_claim_failure`,
`install_invalid_bundle`, `install_slug_collision`, `install_bundle_slug_mismatch`) instead of bare
strings. The manual_rollback() path is the safety net for engines that silently no-op ROLLBACK; it
deletes only rows this call inserted, never pre-existing ones.
`agent upgrade` now passes `is_upgrade => true` so an existing agent row is treated as the upgrade
target instead of returning "Agent slug already exists. Use --slug=<new-slug> to rename on import."
when a bundle's artifacts lack `portable_slug` and the live pipelines/flows have been edited
(`local_modified`). Conflicts come back through `result.conflicts` and the CLI hands them to the
existing PendingActions staging path, matching what `agent diff` already reports.
Defensive cleanup: `InstalledBundleArtifacts::register()` listens on `datamachine_agent_deleted`
and wipes any tracked rows for the deleted agent_id. The importer does not write to that table
today, but extensions can — and a stale row would mis-classify a fresh install as an upgrade
against a non-existent agent.
Also adds two test seam actions, `datamachine_bundle_import_post_claim_started` and
`datamachine_bundle_import_pre_commit`, so tests can throw inside the critical section without
needing a live SQLite race to reproduce the bug.
Tests: tests/Unit/Core/Agents/AgentBundlerImportTest.php covers all three regression shapes:
silent partial success → rollback + typed error, upgrade-against-local-modified → success with
conflicts, agent delete → bundle artifact registry cleared.
Fixes #1801.
Summary
AgentBundler::import(): any failure after theagent row is claimed now returns
success: falsewith a typederror_codeand rolls back everyrow this call inserted.
agent upgradeagainstlocal_modifiedartifacts succeed and surface conflicts instead ofrejecting with "Agent slug already exists" when the bundle's artifacts lack
portable_slug.datamachine_bundle_artifactsrows for an agent ondatamachine_agent_deletedso a subsequent install is classified as a fresh install, not astale upgrade.
Root cause
import()did the following in sequence with no transaction and no post-write verification:agents_repo->update_agent()orcreate_if_missing()to claimagent_id.pipelines_repo->create_pipeline()/update_pipeline()per pipeline.flows_repo->create_flow()/update_flow()per flow.agents_repo->update_agent()again to write the artifact registry intoagent_config.success: truewith the populated summary.Most sub-step return values were dropped on the floor. Under SQLite contention on Studio (the
repro environment in the issue), step 4 silently rolled back without surfacing an error. The
in-memory bundler thought everything wrote; a fresh
SELECTshowed no agent row. Because thereturn value still claimed
agent_id: 8(or 10, 11, …), operators saw success butagent listcame up empty. The next install bumped the auto-increment without persisting either, until eventually
the engine accepted the writes and the agent finally appeared.
A separate but related shape:
agent upgradeof an agent whose live pipeline/flow had been edited(legit operator action) errored with "Agent slug ... already exists. Use --slug= to
rename on import." The upgrade entrypoint passed straight through to
import(), which hit theinstall-collision guard whenever
--slugwas supplied or the bundle's artifacts lackedportable_slug(which madebundle_has_portable_artifacts()return false). Operators had noclean upgrade path; the workaround was destructive delete + reinstall.
Changes
inc/Core/Agents/AgentBundler.phpSTART TRANSACTION/COMMITwith atry/catch \Throwablethat runsROLLBACKand a manual cleanup of rows this call inserted.\RuntimeExceptionon every previously-silent sub-step failure (update_agent→ false,create_pipeline→ 0,update_flow→ false, etc.).persisted before declaring success. SQLite under Studio has been observed silently dropping
these mutations; the verify-then-commit step closes the silent-rollback door.
is_upgradeoption. When set, the slug-collision guard treats an existing row as theupgrade target instead of erroring. The bundle-slug mismatch guard still fires so we don't
overwrite an unrelated agent that happens to share the slug.
install_invalid_bundle,install_slug_collision,install_bundle_slug_mismatch,install_post_claim_failure. The error message oninstall_slug_collisionnow pointsoperators at
agent upgradeas the right next step.datamachine_bundle_import_post_claim_started,datamachine_bundle_import_pre_commit) so tests can throw inside the critical section withoutrelying on a live SQLite race.
inc/Cli/Commands/AgentBundleCommand.phpupgrade()passesis_upgrade => truetoimport()so the install-collision guard does notfire on legitimate upgrades against
local_modifiedartifacts.inc/Core/Database/BundleArtifacts/InstalledBundleArtifacts.phpregister()method hooksdatamachine_agent_deletedto wipe any tracked artifact rowsfor the deleted agent_id. The importer does not write to that table today, but extensions
can — and a stale row would mis-classify a fresh install as an upgrade against a
non-existent agent.
data-machine.phpInstalledBundleArtifacts::register()alongside the other bundle-layer registrations.tests/Unit/Core/Agents/AgentBundlerImportTest.php(new)test_post_claim_failure_rolls_back_and_reports_typed_error— fault-injects viadatamachine_bundle_import_pre_commit, asserts response issuccess: falsewitherror_code: install_post_claim_failureand that no agent / pipeline / flow rows remain.test_upgrade_against_existing_agent_does_not_error_on_slug_collision— installs a cleanbundle, mutates the live pipeline, re-imports with
is_upgrade => true, asserts success andthat the existing agent_id is reused.
test_agent_delete_clears_bundle_artifact_registry— inserts anInstalledBundleArtifactsrow, fires
datamachine_agent_deleted, asserts the row is gone.Reproduction notes
The repro in the issue is two pieces:
Repeatedly
agent install <bundle>— observesuccess: truewith a populatedagent_idwhileagent listdoesn't show the agent. After this change, that scenario either persists fully orreturns
success: falsewitherror_code: install_post_claim_failure. The new test exercisesthe failure mode deterministically by injecting a throw inside the critical section instead of
waiting for the SQLite race.
After editing live pipeline/flow rows for an installed agent, re-run
agent upgrade <bundle>.Previously: "Agent slug ... already exists." After this change: success, with conflicts surfaced
so the CLI can stage them as PendingActions.
Verified behavior
php tests/agent-bundle-portable-update-smoke.php— passes.php tests/agent-bundle-format-smoke.php— passes.php tests/agent-bundle-runtime-drift-smoke.php— passes.php tests/import-agent-ability-smoke.php— passes.php tests/export-agent-ability-smoke.php— passes.php -lclean across all touched files;phpcs -preports no errors on changed files.tests/agent-bundle-upgrade-planner-smoke.php(workspace scope errorinside the planner's PendingActionStore path) is unchanged from
mainand unrelated to thischange — verified by
git stash-ing the diff and rerunning.tests/Unit/Core/Agents/AgentBundlerImportTest.phpare runnable via theexisting
WP_UnitTestCaseharness; they require a live MySQL test DB so they execute in CI /via
composer test.AI assistance
error semantics, and the rollback contract; the AI did not run the live install repro on Studio.